workaround for train_set batching during inference time #784

varisd · 2019-01-09T15:37:09Z

We can currently use max_len parameter in decoders to avoid OOM exceptions during inference time.
However, this is not enough, e.g. when specifying batch_size in number of tokens.

For simplicity, imagine a decoder-only scenario. For example, let the token-level batch_size be 9, which barely fits into memory, and max_len=3. We can get a batch of [4, 2] (batch_size, seq_len). During inference we can easily generate a result of size [4, 3] which will cause OOM.

This PR suggests one possible solution and is open for discussion.

varisd added bug discussion labels Jan 9, 2019

workaround for train_set batching during inference time

0f5649c

varisd force-pushed the batching_workaround branch from 299c1bc to 0f5649c Compare February 21, 2019 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workaround for train_set batching during inference time #784

workaround for train_set batching during inference time #784

varisd commented Jan 9, 2019

workaround for train_set batching during inference time #784

Are you sure you want to change the base?

workaround for train_set batching during inference time #784

Conversation

varisd commented Jan 9, 2019